AITopics | marginal distribution

Identifiable Bayesian Deep Generative Copulas with Unknown Layer Widths for Data with Arbitrary Marginal Distributions

arXiv.org Machine LearningMay-28-2026

Deep generative models offer powerful tools for multivariate data analysis, but their black-box architectures are often unidentified and difficult to interpret. We introduce the Deep Discrete Encoder (DDE) Copula, an identifiable and interpretable generative model for multivariate data with arbitrary marginal distributions. The model places a hierarchical directed network of binary latent variables inside a copula framework, enabling flexible dependence modeling for mixed discrete and continuous data. Estimation is based on rank likelihoods, which decouple marginal modeling from posterior inference on the DDE parameters and avoid specifying the marginal distributions. We establish conditions for identification of the DDE copula parameters, ensuring that layer-specific parameters provide meaningful summaries of multivariate dependence. We also prove quotient-space posterior consistency for continuous margins under the exact rank likelihood and treat the extended rank likelihood for tied or mixed margins as a generalized likelihood, with concentration under an additional contrast condition. For computation, we propose a stochastic expectation-maximization algorithm for \emph{maximum a posteriori} estimation, together with initialization strategies that improve convergence. To learn network dimension adaptively, we extend Bayesian rank-selection priors to infer layer-specific widths. Simulations show strong finite-sample performance, and a personality-survey analysis reveals interpretable hierarchical latent structure in complex multivariate data.

artificial intelligence, likelihood, machine learning, (20 more...)

arXiv.org Machine Learning

2605.27523

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine (0.46)
Government (0.46)

Add feedback

Measuring Differences between Conditional Distributions using Kernel Embeddings

Moskvichev, Peter, Chau, Siu Lun, Sejdinovic, Dino

arXiv.org Machine LearningMay-5-2026

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2605.0226

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

16c0d78ef6a76b5c247113a4c9514059-Supplemental.pdf

Neural Information Processing SystemsMay-1-2026, 01:50:48 GMT

artificial intelligence, hmm, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

00a03ec6533ca7f5c644d198d815329c-AuthorFeedback.pdf

Neural Information Processing SystemsApr-30-2026, 19:48:15 GMT

artificial intelligence, estimation, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

e3fea99df80195b316cefa7aa6099cd5-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:28:12 GMT

artificial intelligence, machine learning, stability, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Safe, Scalable, and Accurate Bayes Posterior Sampling for Large-Data Generalized Linear Mixed Models

Baek, Youngsoo, Berchuck, Samuel I.

arXiv.org Machine LearningApr-30-2026

We consider the problem of scalable sampling algorithms to fit Bayesian generalized linear mixed models on large datasets. Stochastic gradient Langevin dynamics, coupled with smooth re-parameterizations of variance parameters, produces divergent Markov chains and cannot be reliably used for sampling covariance parameters of random effects. We advocate the use of a mirror Langevin dynamics algorithm, propose the novel stochastic mirror Langevin dynamics based on data subsampling, and provide concrete guidelines for its use in a Bayesian inference framework. Based on an explicit Wasserstein distance error bound between the posterior and its algorithmic approximation, we propose a post-processing step that yields an asymptotic, order-wise correct estimation of the posterior variance, eliminating the irreducible posterior variance estimation bias due to subsampling. Empirical performance of the method is evaluated through simulated experiments and a longitudinal study of pain trajectories in a study of breast cancer survivors.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

2604.26029

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

d5470483dd38f71f7bd9e68ce1b94145-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:51:36 GMT

artificial intelligence, machine learning, representation, (12 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

517da335fd0ec2f4a25ea139d5494163-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 21:42:19 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

0b9e57c46de934cee33b0e8d1839bfc2-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 15:34:42 GMT

We use Law(X) to denote the distribution of random variable X. When ν is a probability distribution for over set Ω and Ais a subset of Ω, we use ν(A) to denote the probability that the random variable X belongs to A, when X is sampled from distribution ν. Similarly, the marginal distribution on the next N random variables for fγ,r#ν is fγ,r#ν2. We thus proved equation (22) and Lemma 3 is proved. Lemma 4 Suppose Z1( |A) and Z2( |A) are two conditional distribution with range RN, and for all values of a Ω, Wp(Law(Z1(a)),Law(Z2(a))) c (23) where Z1(a) Z1( |A= a), and Z2(a) Z2( |A= a).

artificial intelligence, constraint, joint distribution, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.94)

Add feedback

Maximum Likelihood Training of Score-Based Diffusion Models

Neural Information Processing SystemsApr-24-2026, 15:17:19 GMT

Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32 ˆ32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.

artificial intelligence, likelihood, machine learning, (17 more...)

Neural Information Processing Systems

Country: